Fficient Iterative Policy Optimization
نویسنده
چکیده
We tackle the issue of finding a good policy when the number of policy updates is limited. This is done by approximating the expected policy reward as a sequence of concave lower bounds which can be efficiently maximized, drastically reducing the number of policy updates required to achieve good performance. We also extend existing methods to negative rewards, enabling the use of control variates.
منابع مشابه
Sample Complexity Bounds for Iterative Stochastic Policy Optimization
This paper is concerned with robustness analysis of decision making under uncertainty. We consider a class of iterative stochastic policy optimization problems and analyze the resulting expected performance for each newly updated policy at each iteration. In particular, we employ concentration-of-measure inequalities to compute future expected cost and probability of constraint violation using ...
متن کاملAn Efficient Heuristic Optimization Algorithm for a Two - Echelon ( R , Q ) Inventory System
This paper presents a two-echelon non-repairable spare parts inventory system that consists of one warehouse and m identical retailers and implements the reorder point, order quantity (R, Q) inventory policy. We formulate the policy decision problem in order to minimize the total annual inventory investment subject to average annual ordering frequency and expected number of backorder constraint...
متن کاملEfficient iterative policy optimization
We tackle the issue of finding a good policy when the number of policy updates is limited. This is done by approximating the expected policy reward as a sequence of concave lower bounds which can be efficiently maximized, drastically reducing the number of policy updates required to achieve good performance. We also extend existing methods to negative rewards, enabling the use of control variates.
متن کاملBilateral Teleoperation Systems Using Backtracking Search optimization Algorithm Based Iterative Learning Control
This paper deals with the application of Iterative Learning Control (ILC) to further improve the performance of teleoperation systems based on Smith predictor. The goal is to achieve robust stability and optimal transparency for these systems. The proposed control structure make the slave manipulator follow the master in spite of uncertainties in time delay in communication channel and model pa...
متن کاملOptimization of Agricultural BMPs Using a Parallel Computing Based Multi-Objective Optimization Algorithm
Beneficial Management Practices (BMPs) are important measures for reducing agricultural non-point source (NPS) pollution. However, selection of BMPs for placement in a watershed requires optimizing available resources to maximize possible water quality benefits. Due to its iterative nature, the optimization typically takes a long time to achieve the BMP trade-off results which is not desirable ...
متن کامل